Allowing mismatches in anchors for wholw genome alignment: Generation and effectiveness
نویسندگان
چکیده
Recent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of the anchors. Some popular software tools use the exact match maximal unique substrings (EM-MUM) as anchors. However, the result is not satisfactory especially for genomes with high mutation rates (e.g. virus). In our experiments, we found that more than 40% of the conserved genes are not recovered. In this paper, we consider anchors with mismatches. Our contributions include the following.
منابع مشابه
Allowing Mismatches in Anchors for Whole Genome Alignment
Recent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of t...
متن کاملBatMis: a fast algorithm for k-mismatch mapping
MOTIVATION Second-generation sequencing (SGS) generates millions of reads that need to be aligned to a reference genome allowing errors. Although current aligners can efficiently map reads allowing a small number of mismatches, they are not well suited for handling a large number of mismatches. The efficiency of aligners can be improved using various heuristics, but the sensitivity and accuracy...
متن کاملProbeMatch: rapid alignment of oligonucleotides to genome allowing both gaps and mismatches
SUMMARY We have developed a tool, called ProbeMatch, for matching a large set of oligonucleotide sequences against a genome database using gapped alignments. Unlike most of the existing tools such as ELAND which only perform ungapped alignments allowing at most two mismatches, ProbeMatch generates both ungapped and gapped alignments allowing up to three errors including insertion, deletion and ...
متن کاملFast Mapping and Precise Alignment of AB SOLiD Color Reads to Reference DNA
Applied Biosystems’ SOLiD system offers a low-cost alternative to the traditional Sanger method of DNA sequencing. We introduce two main algorithms of mapping SOLiD’s color reads onto a reference genome. The first method performs mapping by adapting a greedy alignment framework. In such an alignment, reads are mapped to approximate genome positions, allowing for a pre-specified bound on sequenc...
متن کاملGapMis-OMP: Pairwise Short-Read Alignment on Multi-core Architectures
Pairwise sequence alignment has received a new motivation due to the advent of next-generation sequencing technologies, particularly so for the application of re-sequencing—the assembly of a genome directed by a reference sequence. After the fast alignment between a factor of the reference sequence and a high-quality fragment of a short read by a short-read alignment programme, an important pro...
متن کامل